In this evaluation, there are total 7 datasets. We used the evaluation metrics implemented in OmicsEV package to evaluate these datasets. The sample and class information for each dataset are shown in the table below.
| class | Array | d1 | d2 | d3 | d4 | d5 | d6 |
|---|---|---|---|---|---|---|---|
| Basal | 17 | 17 | 17 | 17 | 17 | 17 | 17 |
| Her2 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| LumA | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
| LumB | 22 | 22 | 22 | 22 | 22 | 22 | 22 |
| None | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
The detailed sample information is shown below.
| sample | class | batch | order |
|---|---|---|---|
| TCGA.A2.A0CM | Basal | 1 | 1 |
| TCGA.A2.A0D0 | Basal | 1 | 2 |
| TCGA.A2.A0D1 | None | 1 | 3 |
| TCGA.A2.A0D2 | Basal | 1 | 4 |
| TCGA.A2.A0EQ | Her2 | 1 | 5 |
| TCGA.A2.A0EV | LumA | 1 | 6 |
| TCGA.A2.A0EX | LumA | 1 | 7 |
| TCGA.A2.A0EY | LumB | 1 | 8 |
| TCGA.A2.A0SW | LumB | 1 | 9 |
| TCGA.A2.A0SX | Basal | 1 | 10 |
| TCGA.A2.A0T1 | Her2 | 1 | 11 |
| TCGA.A2.A0T2 | Basal | 1 | 12 |
| TCGA.A2.A0T6 | LumA | 1 | 13 |
| TCGA.A2.A0T7 | LumA | 1 | 14 |
| TCGA.A2.A0YC | LumA | 1 | 15 |
| TCGA.A2.A0YD | LumA | 1 | 16 |
| TCGA.A2.A0YF | LumA | 1 | 17 |
| TCGA.A2.A0YG | LumB | 1 | 18 |
| TCGA.A2.A0YI | LumA | 1 | 19 |
| TCGA.A2.A0YL | LumA | 1 | 20 |
| TCGA.A2.A0YM | Basal | 1 | 21 |
| TCGA.A7.A0CD | LumA | 1 | 22 |
| TCGA.A7.A0CE | Basal | 1 | 23 |
| TCGA.A7.A0CJ | LumB | 1 | 24 |
| TCGA.A8.A06N | LumB | 1 | 25 |
| TCGA.A8.A06Z | LumB | 1 | 26 |
| TCGA.A8.A076 | LumB | 1 | 27 |
| TCGA.A8.A079 | LumB | 1 | 28 |
| TCGA.A8.A09G | Her2 | 1 | 29 |
| TCGA.A8.A09I | LumB | 1 | 30 |
| TCGA.AN.A04A | None | 1 | 31 |
| TCGA.AN.A0AJ | LumB | 1 | 32 |
| TCGA.AN.A0AL | Basal | 1 | 33 |
| TCGA.AN.A0AM | LumB | 1 | 34 |
| TCGA.AN.A0AS | LumA | 1 | 35 |
| TCGA.AN.A0FK | LumA | 1 | 36 |
| TCGA.AN.A0FL | Basal | 1 | 37 |
| TCGA.AO.A03O | None | 1 | 38 |
| TCGA.AO.A0J6 | None | 1 | 39 |
| TCGA.AO.A0J9 | None | 1 | 40 |
| TCGA.AO.A0JC | None | 1 | 41 |
| TCGA.AO.A0JE | None | 1 | 42 |
| TCGA.AO.A0JJ | None | 1 | 43 |
| TCGA.AO.A0JL | None | 1 | 44 |
| TCGA.AO.A0JM | None | 1 | 45 |
| TCGA.AO.A126 | None | 1 | 46 |
| TCGA.AO.A12B | None | 1 | 47 |
| TCGA.AO.A12E | None | 1 | 48 |
| TCGA.AR.A0TR | LumA | 1 | 49 |
| TCGA.AR.A0TT | LumB | 1 | 50 |
| TCGA.AR.A0TV | LumB | 1 | 51 |
| TCGA.AR.A0TX | Her2 | 1 | 52 |
| TCGA.AR.A0U4 | None | 1 | 53 |
| TCGA.BH.A0EE | Her2 | 1 | 54 |
| TCGA.BH.A0HP | LumA | 1 | 55 |
| TCGA.A2.A0T3 | LumB | 1 | 56 |
| TCGA.A7.A13F | LumB | 1 | 57 |
| TCGA.AO.A12D | None | 1 | 58 |
| TCGA.AO.A12F | None | 1 | 59 |
| TCGA.AR.A0TY | LumB | 1 | 60 |
| TCGA.AR.A1AQ | Basal | 1 | 61 |
| TCGA.AR.A1AV | LumA | 1 | 62 |
| TCGA.AR.A1AW | LumB | 1 | 63 |
| TCGA.BH.A0AV | Basal | 1 | 64 |
| TCGA.BH.A0C1 | LumA | 1 | 65 |
| TCGA.BH.A0C7 | LumB | 1 | 66 |
| TCGA.BH.A0E9 | LumA | 1 | 67 |
| TCGA.C8.A12L | Her2 | 1 | 68 |
| TCGA.C8.A12P | Her2 | 1 | 69 |
| TCGA.C8.A12Q | Her2 | 1 | 70 |
| TCGA.C8.A12T | Her2 | 1 | 71 |
| TCGA.C8.A12U | LumB | 1 | 72 |
| TCGA.C8.A12V | Basal | 1 | 73 |
| TCGA.C8.A12W | LumB | 1 | 74 |
| TCGA.C8.A12Z | Her2 | 1 | 75 |
| TCGA.C8.A130 | LumB | 1 | 76 |
| TCGA.C8.A131 | Basal | 1 | 77 |
| TCGA.C8.A134 | Basal | 1 | 78 |
| TCGA.C8.A135 | Her2 | 1 | 79 |
| TCGA.C8.A138 | Her2 | 1 | 80 |
| TCGA.D8.A13Y | LumB | 1 | 81 |
| TCGA.D8.A142 | Basal | 1 | 82 |
| TCGA.E2.A10A | LumA | 1 | 83 |
| TCGA.E2.A150 | Basal | 1 | 84 |
| TCGA.E2.A154 | LumA | 1 | 85 |
| TCGA.E2.A159 | Basal | 1 | 86 |
| dataSet | # proteins (genes) | # proteins (genes) [50%] | complex_ks | gene_wise_cor | sample_wise_cor | AUROC | func_auc |
|---|---|---|---|---|---|---|---|
| Array | 17814 | 17814 | 0.1751827 | 0.3020985 | 0.1970433 | 0.9102871 | 0.8098193 |
| d1 | 20501 | 18694 | 0.2415586 | 0.3293080 | 0.1420492 | 0.9952153 | 0.8219823 |
| d2 | 20501 | 18717 | 0.2128561 | 0.3348777 | 0.1421854 | 0.9928230 | 0.7950738 |
| d3 | 20501 | 18694 | 0.2890523 | 0.3354781 | 0.1420492 | 0.9868421 | 0.8352795 |
| d4 | 20501 | 18694 | 0.2822335 | 0.3368288 | 0.1420492 | 0.9868421 | 0.8284495 |
| d5 | 20501 | 18694 | 0.2108862 | 0.3208170 | 0.1420492 | 0.9928230 | 0.8257509 |
| d6 | 20501 | 18694 | 0.2436248 | 0.3279982 | 0.1381387 | 0.9880383 | 0.8158570 |
The table below shows the number of identified proteins or genes for each dataset. We take the proteins or genes filtered by 50% missing value as quantified proteins or genes.
| dataSet | # proteins (genes) | # proteins (genes) [50%] |
|---|---|---|
| Array | 17814 | 17814 |
| d1 | 20501 | 18694 |
| d2 | 20501 | 18717 |
| d3 | 20501 | 18694 |
| d4 | 20501 | 18694 |
| d5 | 20501 | 18694 |
| d6 | 20501 | 18694 |
Upset chart below showing overlap in proteins or genes identified in each dataset. Numbers of identified proteins or genes shared between different datasets are indicated in the top bar chart and the specific datasets in each set are indicated with solid points below the bar chart. Total identifications for each dataset are indicated on the left as ‘Set size’.
The figures below show the number of proteins or genes identified in each sample. The samples from different batches are coded in different shapes and the samples from different classes are coded in different colors.
Arrayd1
d2
d3
d4
d5
d6
The boxplots show the protein or gene expression distribution across samples. X axis is sample ordered by input order. Y axis is log2 transformed protein or gene expression. The samples from different classes are coded in different colors.
Arrayd1
d2
d3
d4
d5
d6
The density plots show the protein or gene expression distribution across samples. X axis is log2 transformed protein or gene expression. Y axis is density.
In these figures, each column is a sample, each row is also a sample. The color indicates the correlation between samples. The samples are ordered by batches.
Arrayd1
d2
d3
d4
d5
d6
Arrayd1
d2
d3
d4
d5
d6
The missing value distribution can give an overview of the percent of missing values of all proteins or genes in both the QC and experiment samples.
Arrayd1
d2
d3
d4
d5
d6
Arrayd1
d2
d3
d4
d5
d6
Arrayd1
d2
d3
d4
d5
d6
The table showing below is a summary of the evaluation. ‘diff’ is Cor(intra) - Cor(inter). ‘ks’ is the statistic value of Kolmogorov-Smirnov test.
| dataSet | InterComplex | IntraComplex | diff | ks |
|---|---|---|---|---|
| Array | 0.008 | 0.086 | 0.078 | 0.175 |
| d1 | 0.033 | 0.164 | 0.131 | 0.242 |
| d2 | 0.003 | 0.108 | 0.105 | 0.213 |
| d3 | 0.016 | 0.182 | 0.166 | 0.289 |
| d4 | 0.010 | 0.171 | 0.161 | 0.282 |
| d5 | 0.062 | 0.180 | 0.118 | 0.211 |
| d6 | 0.032 | 0.163 | 0.131 | 0.244 |
| dataSet | n | n5 | n6 | n7 | n8 | median_cor |
|---|---|---|---|---|---|---|
| Array | 8312 | 1379 | 504 | 115 | 10 | 0.302 |
| d1 | 9129 | 1911 | 773 | 210 | 21 | 0.329 |
| d2 | 9131 | 1989 | 837 | 222 | 22 | 0.335 |
| d3 | 9129 | 1993 | 823 | 223 | 24 | 0.335 |
| d4 | 9129 | 2006 | 837 | 225 | 24 | 0.337 |
| d5 | 9129 | 1764 | 693 | 185 | 20 | 0.321 |
| d6 | 9129 | 1931 | 763 | 207 | 20 | 0.328 |
Build model for prediction: LumA,LumB .
| dataSet | Variables | ROC | Sens | Spec |
|---|---|---|---|---|
| Array | 17814 | 0.910 | 0.789 | 0.818 |
| d1 | 18694 | 0.995 | 0.947 | 1.000 |
| d2 | 18717 | 0.993 | 0.947 | 0.909 |
| d3 | 18694 | 0.987 | 0.947 | 0.909 |
| d4 | 18694 | 0.987 | 0.947 | 0.955 |
| d5 | 18694 | 0.993 | 0.947 | 0.909 |
| d6 | 18694 | 0.988 | 0.895 | 0.909 |
In this evaluation, each dataset was used to build co-expression network. For a selected network and a selected function term (such as GO or KEGG), proteins/genes annotated to the term and also included in the network were defined as a positive protein/gene set and other proteins/genes in the network constituted the negative protein/gene set for the term. For a selected function term, we use some of the proteins/genes as the seed protein/gene, then we use random walk algorithm to calculate scores for other proteins/genes. A higher score of a protein/gene represents a closer relationship between the protein/gene and the seed proteins/genes. Finally, for each selected function term, we calculate an AUROC to evaluate the prediction performance.
| Array | d1 | d2 | d3 | d4 | d5 | d6 | |
|---|---|---|---|---|---|---|---|
| Allograft rejection | 0.922 | 0.993 | 0.975 | 0.99 | 0.987 | 0.991 | 0.99 |
| Aminoacyl-tRNA biosynthesis | 0.663 | 0.804 | 0.807 | 0.759 | 0.797 | 0.809 | 0.816 |
| Antigen processing and presentation | 0.867 | 0.814 | 0.796 | 0.835 | 0.858 | 0.83 | 0.862 |
| Asthma | 0.921 | 0.92 | 0.858 | 0.957 | 0.957 | 0.908 | 0.906 |
| Autoimmune thyroid disease | 0.973 | 0.964 | 0.942 | 0.94 | 0.938 | 0.965 | 0.949 |
| Cell adhesion molecules (CAMs) | 0.72 | 0.81 | 0.756 | 0.815 | 0.786 | 0.827 | 0.818 |
| Complement and coagulation cascades | 0.75 | 0.822 | 0.824 | 0.895 | 0.856 | 0.766 | 0.814 |
| DNA replication | 0 | 0.89 | 0.87 | 0.843 | 0.876 | 0.897 | 0.859 |
| Drug metabolism - other enzymes | 0.813 | 0.595 | 0.579 | 0.657 | 0.577 | 0.607 | 0.629 |
| ECM-receptor interaction | 0.839 | 0.876 | 0.827 | 0.838 | 0.838 | 0.848 | 0.851 |
| Glycosphingolipid biosynthesis - lacto and neolacto series | 0.755 | 0.847 | 0.708 | 0.704 | 0.701 | 0.776 | 0.795 |
| Glycosylphosphatidylinositol(GPI)-anchor biosynthesis | 0.82 | 0.664 | 0.642 | 0.672 | 0.653 | 0.703 | 0.72 |
| Graft-versus-host disease | 0.935 | 0.99 | 0.982 | 0.99 | 0.992 | 0.988 | 0.99 |
| Homologous recombination | 0 | 0.842 | 0.755 | 0.739 | 0.763 | 0.8 | 0.773 |
| Intestinal immune network for IgA production | 0.873 | 0.853 | 0.859 | 0.89 | 0.881 | 0.833 | 0.816 |
| Malaria | 0.826 | 0.838 | 0.828 | 0.841 | 0.842 | 0.835 | 0.806 |
| Metabolism of xenobiotics by cytochrome P450 | 0.8 | 0.647 | 0.749 | 0.779 | 0.79 | 0.789 | 0.769 |
| Mismatch repair | 0 | 0.857 | 0.769 | 0.778 | 0.85 | 0.821 | 0.837 |
| Oxidative phosphorylation | 0.791 | 0.822 | 0.757 | 0.841 | 0.828 | 0.819 | 0.835 |
| Parkinsons disease | 0.736 | 0.811 | 0.698 | 0.805 | 0.785 | 0.786 | 0.797 |
| Primary immunodeficiency | 0.759 | 0.782 | 0.767 | 0.781 | 0.786 | 0.826 | 0.806 |
| Proteasome | 0.895 | 0.865 | 0.8 | 0.892 | 0.912 | 0.849 | 0.895 |
| Protein export | 0.807 | 0.845 | 0.769 | 0.876 | 0.837 | 0.765 | 0.839 |
| Retinol metabolism | 0.754 | 0.707 | 0.812 | 0.849 | 0.822 | 0.878 | 0.824 |
| Ribosome | 0.935 | 0.943 | 0.869 | 0.949 | 0.932 | 0.942 | 0.94 |
| Ribosome biogenesis in eukaryotes | 0.81 | 0.71 | 0.718 | 0.746 | 0.757 | 0.694 | 0.723 |
| RNA polymerase | 0 | 0.762 | 0.709 | 0.805 | 0.774 | 0.794 | 0.728 |
| RNA transport | 0.852 | 0.746 | 0.724 | 0.74 | 0.745 | 0.732 | 0.731 |
| Spliceosome | 0.699 | 0.776 | 0.772 | 0.803 | 0.812 | 0.772 | 0.794 |
| Staphylococcus aureus infection | 0.874 | 0.953 | 0.922 | 0.94 | 0.933 | 0.914 | 0.951 |
| Steroid hormone biosynthesis | 0.607 | 0.793 | 0.795 | 0.831 | 0.791 | 0.847 | 0.791 |
| Systemic lupus erythematosus | 0.89 | 0.828 | 0.85 | 0.854 | 0.854 | 0.838 | 0.836 |
| Terpenoid backbone biosynthesis | 0.811 | 0.721 | 0.725 | 0.775 | 0.773 | 0.698 | 0.776 |
| Type I diabetes mellitus | 0.82 | 0.885 | 0.859 | 0.873 | 0.889 | 0.896 | 0.904 |
| Viral myocarditis | 0.782 | 0.773 | 0.754 | 0.829 | 0.788 | 0.757 | 0.795 |